Integrating Vocabularies: Discovering and Representing Vocabulary Maps
نویسنده
چکیده
The Semantic Web would enable new ways of doing business on the Web that require development of advanced business document integration technologies performing intelligent document transformation. The documents use different vocabularies that consist of large hierarchies of terms. Accordingly, vocabulary mapping and transformation becomes an important task in the whole business document transformation process. It includes several subtasks: map discovery, map representation, and map execution that must be seamlessly integrated into the document integration process. In this paper we discuss the process of discovering the maps between two vocabularies assuming availability of two sets of documents, each using one of the vocabularies. We take the vocabularies of product classification codes as a playground and propose a reusable map discovery technique based on Bayesian text classification approach. We show how the discovered maps can be integrated into the document transformation process.
منابع مشابه
Changing Controlled Vocabularies
For the foreseeable future, controlled medical vocabularies will be in a constant state of development, expansion and refinement. Changes in controlled vocabularies must be reconciled with historical patient information which is coded using those vocabularies and stored in clinical databases. This paper explores the kinds of changes that can occur in controlled vocabularies, including adding te...
متن کاملLinguistic Watermark 3.0: An RDF Framework and a Software Library for Bridging Language and Ontologies in the Semantic Web
In this paper, we present a framework for representing heterogeneous linguistic resources and for integrating their content with Semantic Web ontologies. This work, which extends and improves previous research conducted by these same authors, articulates into two main results: first, a set of coordinated RDF vocabularies providing descriptors for representing linguistic resources and their soft...
متن کاملVocabulary Conversion : Performance with Controlled and Uncontrolled Terms and Tags Technical
Controlled and uncontrolled indexing terminology and metadata may be converted from one to another. Decision criteria are developed that can be used to determine which terms should be assigned when converting vocabularies. Methods are developed for computing the parameters of these systems, as well as means for estimating the parameters when given limited information. These conversion technique...
متن کاملCreating an Order in Distributed Digital Libraries by Integrating Independent Self-Organizing Maps
Digital document libraries are an almost perfect application arena for un-supervised neural networks. This because many of the operations computers have to perform on text documents are classiication tasks based on \noisy" input patterns. The \noise" arises because of the known inaccuracy of mapping natural language to an indexing vocabulary representing the contents of the documents. A growing...
متن کامل